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Here, we report the three-dimensional structure of severe acute respiratory syndrome coronavirus (SARS- 
CoV) nsP7, a component of the SARS-CoV replicase polyprotein. The coronavirus replicase carries out 
regulatory tasks involved in the maintenance, transcription, and replication of the coronavirus genome. nsP7 
was found to assume a compact architecture in solution, which is comprised primarily of helical secondary 
structures. Three helices (<*2 to «4) form a flat up-down-up antiparallel a-helix sheet. The N-terminal segment 
of residues 1 to 22, containing two turns of a-helix and one turn of 3 10 -helix, is packed across the surface of 
a2 and a3 in the helix sheet, with the a-helical region oriented at a 60° angle relative to a2 and a3. The surface 
charge distribution is pronouncedly asymmetrical, with the flat surface of the helical sheet showing a large 
negatively charged region adjacent to a large hydrophobic patch and the opposite side containing a positively 
charged groove that extends along the helix al. Each of these three areas is thus implicated as a potential site 
for protein-protein interactions. 


The severe acute respiratory syndrome coronavirus (SARS- 
CoV) most closely resembles the group II coronaviruses, which 
infect mice, rats, pigs, and humans (34). Upon viral entry, the 
~30-kb SARS-CoV genome is translated to produce a pre¬ 
dicted 486-kDa polyprotein (PPla) as well as a longer form of 
the polyprotein containing a predicted 304-kDa carboxyl-ter¬ 
minal extension (PPlab) that is generated via a ribosomal 
frameshift event (38). The PPlab form of the replicase 
polyprotein contains enzymatic signatures likely involved in 
RNA replication and processing. The short and long forms of 
the polyprotein are proteolytically processed into about 16 
mature polypeptides (nonstructural proteins; nsP) by protein- 
ases encoded in PPla (31, 34). These polypeptides form the 
subunits of a replicase complex that associates with intracellu¬ 
lar membranes and is responsible for replication of the viral 
genome at defined intracellular sites. 
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The functions of the nonstructural proteins of the replicase 
complexes of coronaviruses are, as yet, poorly defined. Several 
proteins are predicted to be involved in RNA processing. For 
example, nsP13 has confirmed helicase activity (13, 33) and 
nsP12 is predicted to be an RNA-dependent RNA polymerase 
(34), while several other proteins have, as yet, no functional 
annotation. In addition to their implicated roles in viral ge¬ 
nome replication and subgenomic RNA synthesis, new roles 
for nsP as determinants of pathogenicity have been postulated 
(22). For example, certain viruses modulate host cell signaling 
pathways to down-regulate the immune response, modify cy¬ 
tokine secretion, and allow greater viral proliferation (1). Host 
cell apoptotic pathways may also be up- or down-regulated (3). 
Thus, the overexpression of a protein unique to SARS-CoV, 
the accessory protein ORF7a, has been shown to stimulate 
apoptosis via a caspase-dependent pathway (37). Interactions 
between nsPl of equine arteritis virus ( Nidovirales order) and 
host cell transcription regulatory factors have been demon¬ 
strated (39). 

The consortium for Functional and Structural Proteomics of 
SARS-CoV-related proteins (http://sars.scripps.edu) was es¬ 
tablished to provide structural information for SARS-CoV 
proteins and to characterize their protein-protein and protein- 
nucleic acid interactions. A structural genomics approach orig¬ 
inally developed for Thermotoga maritima (reference 17 and 
http://www.jcsg.org) was adapted for the 28 proteins encoded 
by the SARS genome. A bioinformatics approach to domain 
identification and definition within the genome was used to 
design 163 constructs, most of which were cloned and tested 
for soluble expression in a small-scale structural genomics ex- 
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pression system (28). Several expressing constructs were bio- 
physically analyzed using one-dimensional (ID) 1 H nuclear 
magnetic resonance (NMR) screening (29, 30), and the first 
target selected from this screen for structure determination 
was the nonstructural protein 7 (nsP7). 

Here, we describe the structure of the predicted nsP7 do¬ 
main of the polyprotein lab (PPlab). nsP7 is conserved within 
the Coronaviridae and has no detectable orthologs outside this 
family of viruses. nsP7 is contained within the portion of pplab 
thought to comprise a replication complex and is an 83-amino- 
acid polypeptide predicted to include four a-helices. The ex¬ 
pression of nsP7 in infected cells, and its localization to cyto¬ 
plasmic, membrane-containing foci thought to be sites of viral 
RNA replication, have been demonstrated for cells infected 
with mouse hepatitis virus (MHV) (2), human coronavirus 
229E (44), and avian infectious bronchitis virus (23). Further¬ 
more, the MHV nsP7 has been shown to interact specifically 
with nsPIO, a protein also specific to coronaviruses, and with 
nsPl, a protein thought to be involved in viral replication and 
assembly (4). These data indicate a function of nsP7 in coro- 
navirus-specific RNA replication mechanisms. 

MATERIALS AND METHODS 

Cloning of the SARS-CoV proteome. Vero-E6 cells were inoculated with 
SARS-CoV strain Tor2 (GenBank accession number NC_004718) at a multiplic¬ 
ity of —10 PFU/cell. Twenty-four hours after inoculation, cells were lysed with 
TRIzol (Invitrogen) and RNA was extracted according to the manufacturer’s 
protocol. First-strand cDNA synthesis using murine leukemia virus reverse tran¬ 
scriptase (Invitrogen) was primed with random hexamer, oligonucleotide T, or 
SARS-CoV-specific oligonucleotides. SARS-specific primers used for first-strand 
synthesis were as follows: SARS-lr, CTTCAGGTGTAGGTTCTGG; SARS-4r, 
CAGTCTTTAATAATGATTGGC; SARS-7r, GAGTTAAATAAAGAGTGT 
CTG; and SARS-lOr, TTTTTTTTTTGTCATTCTCC. Full-length nsP7 ampli- 
cons were obtained by PCR using the following primers: SARS070f, ATGTCT 
AAAATGTCTGACGTAAAGTGCACATCTG; and SARS070r, CCCGGCCG 
GCCCTACTGAAGAGTAGCACGGTTATCG. SARS-CoV cDNA was cloned 
into pMHIF, which is a customized expression vector derived from pBAD 
(Invitrogen). Expression in pMHIF is driven by the araBAD promoter, and the 
recombinant protein is produced with a Thio 6 His 6 tag (MGSDKIHHHHHH) at 
its N terminus. Four designed nsP7 constructs were transformed into the methi¬ 
onine auxotrophic Escheiichia coli strain DL41, and microexpression trials were 
conducted as described previously (28), using 2XYT as the growth medium and 
0.2% (wt/vol, final concentration) L-arabinose (Sigma, St. Louis, MO) as the 
inducer at different temperatures. Cell pellets were lysed by resuspension and 
incubation at room temperature for 15 min in a solution of lysozyme (1 mg/ml) 
in 50 mM Tris-HCl, pH 7.5, with 50 mM sucrose, 1 mM EDTA, and 0.25 pl/ml 
Benzonase endonuclease. Equal volumes of 10 mM Tris-HCl [pH 7.5] with 50 
mM KC1, 10 mM MgCl 2 , and 1 mM EDTA were added, and the suspensions 
were incubated for a further 15 min at room temperature. Cell debris was 
collected by centrifugation, and the soluble protein fractions were evaluated by 
sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). Solu¬ 
ble protein constructs were selected for larger-scale fermentation and further 
evaluation. Larger-scale fermentation was carried out in a fermenter equipped 
for 65-ml cultures (17) with terrific broth as the growth medium; otherwise, 
growth conditions were as described for microexpression. Proteins were purified 
by IMAC Co 2+ -aflinity chromatography (Talon resin; Clontech) and by a second 
ion exchange step. Purified proteins were subjected to ID X H NMR screening 
(29, 30). 

Production of nsP7 for NMR spectroscopy. A construct representing nsP7 with 
an extra N-terminal dipeptide Gly-His (residues 1 and 2) was subcloned into a 
vector derived from pET-28 (Novagen), which encodes a Thio 6 His 6 expression/ 
purification tag (MGSDKIHHHHHH) and a tobacco etch virus (TEV) cleavage 
site (ENLYFQGH). This plasmid was transformed into the E. coli strain BL21- 
CodonPlus (DE3)-RIL (Stratagene). The expression of uniformly 15 N-labeled 
and 13 C, 15 N-labeled nsP7 was carried out by growing freshly transformed cells in 
M9 minimal medium containing 1 g/liter 15 NH 4 C1 and 4 g/liter [ 13 C 6 ]-D-glucose 
as the sole nitrogen and carbon sources, respectively. Cell cultures were grown at 
37°C with vigorous shaking to an optical density at 600 nm of 0.6 to 0.7. The 


temperature was slowly lowered to 18°C, and after induction with 1 mM isopro- 
pyl-P-D-thiogalactopyranoside, the cell cultures were grown for 18 h. The cells 
were harvested by centrifugation, resuspended in extraction buffer (50 mM 
Tris-HCl at pH 8.0, 5 mM imidazole, 500 mM NaCl, 0.1% Triton X-100, and 
Complete protease inhibitor tablets [Roche]), and lysed by sonication. The cell 
debris was removed by centrifugation. For the first purification step, the soluble 
protein was loaded onto a HisTrap HP column (Pharmacia), equilibrated with 50 
mM Tris-HCl at pH 8.0, 5 mM imidazole, and 500 mM NaCl. The protein was 
eluted with a 0 to 250 mM imidazole gradient. Fractions containing nsP7 were 
pooled and buffer exchanged against 50 mM sodium phosphate at pH 7.5 with 
300 mM NaCl and concentrated to a total volume of about 10 ml. After the 
addition of TEV NIa protease, the solution was vigorously shaken for 3 to 6 h at 
room temperature. The progression of the TEV cleavage was tested by SDS- 
PAGE analysis. After cleavage was at least 95% complete, the sample was again 
concentrated and loaded onto a precalibrated (50 mM sodium phosphate at pH 
7.5, 300 mM NaCl) IMAC column (Talon resin; Clontech). The fractions con¬ 
taining nsP7 were again pooled, the homogeneity of the purified protein was 
evaluated by SDS-PAGE, and the solution was concentrated to a final volume of 
550 jxl and dithiothreitol-d 10 was added at a concentration of 5 mM. The final 
concentrations of nsP7 in the different NMR samples were between 1.0 and 3.5 
mM. 

Data collection. NMR measurements were performed at 298 K with Bruker 
Avance600 and Avance900 spectrometers, using TXI-HCN-z or TXI-HCN-xyz 
gradient probe heads. Proton chemical shifts were referenced to internal 3-(tri- 
methylsilyl)-l-propanesulfonic acid sodium salt (DSS). The 13 C and 15 N chem¬ 
ical shifts were referenced indirectly to DSS, using the absolute frequency ratios. 

Chemical shift assignment and structure calculation. The determination of 
the 3D structure of a protein (41, 42) requires sequence-specific resonance 
assignment (assignment of each X H, 13 C, and 15 N resonance frequency to a 
specific atom within the protein), obtained by combining the results of several 2D 
and 3D NMR experiments recorded on uniformly 13 C- and 15 N-enriched protein 
samples. A separate set of experiments based on the nuclear Overhauser effect 
(NOE) measures interatomic distances within the protein. These distances are 
applied as restraints during molecular dynamics simulations, which also include 
restraints on bond lengths and angles, chirality, planarity, and torsional angles, to 
enforce correct geometry. A molecular dynamics protocol implemented in the 
program DYANA, in which torsion angles rather than Cartesian coordinates are 
the degrees of freedom, offers good sampling and convergence properties for 
biomolecular structures (7-9). Calculations are repeated several times with dif¬ 
ferent randomized starting structures to yield an ensemble of conformers that are 
representative of the conformation space sampled by the protein in solution. 

The following spectra (32) were used to obtain sequence-specific backbone 
and side chain assignments: 2D [ 1 H, 15 N]-heteronuclear single quantum coher¬ 
ence (HSQC), 3D HNCA, 3D HNCACB, 3D CBCA(CO)NH, 3D HNCO, 3D 
HBHA(CO)NH, 3D 15 N-resolved [ X H,^-total-correlation spectroscopy 
(TOCSY), and 3D HC(C)H-TOCSY. 2D ^H/HJ-NOE spectroscopy (NOESY), 
2D [ 1 H, 1 H]-TOCSY, and 2D [ X H,^-correlation spectroscopy (COSY) of a 
nsP7 sample in D 2 0 solution after complete H/D exchange of the labile protons 
were used to assign the aromatic side chains (41). The NMR spectra were 
processed with XWINNMR3.5 (Bruker, Billerica, Mass.) and analyzed with 
CARA (R. Keller et al., unpublished data). 

The input for the structure calculation was collected from 3D 15 N-resolved 
[ 1 H, 1 H]-NOESY and 3D 13 C-resolved [ 1 H, 1 H]-NOESY spectra recorded in H 2 0 
solution and a 2D [ 1 H, 1 H]-NOESY spectrum recorded in D 2 0 solution. All 
three NOESY spectra were measured at 900 MHz with a mixing time of 90 ms 
and were automatically analyzed with a standalone version of the new software 
package ATNOS/CANDID (version 0.9), which incorporates the functionalities 
of the two algorithms ATNOS (11) for automated peak picking and NOE 
identification in 2D homonuclear- and 3D heteronuclear-resolved [ 1 H, 1 H]- 
NOESY spectra, and CANDID (10) for automated NOE assignment. ATNOS/ 
CANDID was combined with the program DYANA (9), which was used to 
perform the structure calculation with simulated annealing in torsion angle 
space. The ATNOS/CANDID input consisted of the chemical shift lists obtained 
from the sequence-specific resonance assignment and the three aforementioned 
NOESY spectra. The standard protocol with seven cycles of peak picking, NOE 
assignment, and 3D structure calculation was applied. At the outset of the 
spectral analysis, ATNOS/CANDID used highly permissive criteria to identify a 
comprehensive set of peaks in the NOESY spectra. Only the knowledge of the 
covalent polypeptide structure and the chemical shift lists were exploited to guide 
NOE cross peak identification, and ambiguous constraints (24) were used for the 
NOE assignment. In the second and subsequent cycles, the intermediate protein 
three-dimensional structures served as an additional guide for the interpretation 
of the NOESY data. The ATNOS/CANDID output consisted of assigned NOE 
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peak lists for each input spectrum, and a final set of meaningful upper limit 
distance constraints which constituted the input for the DYANA three-dimen¬ 
sional structure calculation algorithm. For each cycle of structure calculation, 
constraints on the backbone dihedral angles angles (|> and iji derived from the C“ 
chemical shifts (19, 35) were added to the ATNOS/CANDID output. For the 
final structure calculation in cycle 7, ATNOS/CANDID retained only distance 
constraints that could be unambiguously assigned based on the protein three- 
dimensional structure from cycle 6. The 20 conformers with the lowest residual 
DYANA target function values obtained from cycle 7 were energy refined in a 
water shell with the program OPALp (14, 18), using the AMBER force field (5). 
The program MOLMOL (15) was used to analyze the protein structure and to 
prepare the figures of the NMR structures. 

Validation and data deposition. Analysis of the stereochemical quality of the 
models was accomplished using the Joint Center for Structural Genomics 
(JCSG) Validation Central suite (http://www.jcsg.org), which integrates seven 
validation tools: Procheck 3.5.4, SFcheck 4.0, Prove 2.5.1, ERRAT, WASP, 
DDQ 2.0, and Whatcheck. The 1 H, 13 C, and 15 N chemical shifts have been 
deposited in the BioMagResBank (BMRB; http://www.bmrb.wisc.edu) under 
BMRB accession number 6513. 

Protein structure accession number. The atomic coordinates of the bundle of 
20 conformers used to represent the nsP7 structure have been deposited in the 
Protein Data Bank (http://www.rcsb.org/pdb/) with the code 1YSY. 

RESULTS AND DISCUSSION 

Structural genomics strategy. The SARS-CoV genome is 
predicted to encode 28 proteins (34). Bioinformatics analyses 
of these protein sequences involving prediction of secondary 
structure, domain boundaries, and disordered regions (details 
to be published elsewhere) yielded a set of 163 constructs, 
which had a reasonable probability of yielding soluble recom¬ 
binant proteins while providing redundant coverage of the 
proteome. These constructs were amenable to processing via a 
high-throughput structural genomics pipeline adapted from 
the JCSG (reference 17 and http://www.jcsg.org) that included 
(i) PCR amplification of the constructs from a SARS cDNA 
library and the cloning of these constructs into multiple ex¬ 
pression vectors, (ii) microexpression trials of the cloned con¬ 
structs in multiple E. coli strains using different induction tem¬ 
peratures, (iii) large-scale fermentation of the expressing 
clones, and (iv) purification of the expressed proteins by Co 2+ - 
aflfinity purification and ion exchange chromatography. 

Expression and solubility screening of nsP7 constructs. 
Four alternate nsP7 constructs were designed based on the 
prediction that the first seven and the last five residues of the 
sequence would form disordered random coil segments. These 
constructs were prepared as described in Materials and Meth¬ 
ods and tested for expression in E. coli. Soluble expression was 
observed for full-length nsP7 and for the individually N- or 
C-terminally truncated constructs, but not for the combined N- 
and C-terminally truncated constructs. Since in this case no 
improvement in expression or solubility resulted from trunca¬ 
tion, the full-length protein was selected for further studies. 

ID 1 H NMR fold screening. A subset of the SARS-CoV 
protein domain constructs that were successfully expressed in 
E. coli was screened for globular folding in solution, using a ID 
1 F1 NMR screening approach developed for structural genom¬ 
ics (29, 30). nsP7 was chosen as a promising target for NMR 
structure determination based on a ID spectrum indicative of 
a well-folded protein. 

Structure determination. Using an input consisting of the 
chemical shift lists from the sequence-specific resonance as¬ 
signments of nsP7 and of the three NOESY spectra described 
in Materials and Methods, the program package ATNOS/ 


CANDID yielded a total of 2,413 assigned NOE cross peaks in 
the final cycle 7. These yielded 1,066 meaningful NOE upper 
distance limits as input for the final structure calculation with 
the program DYANA (Table 1) (Fig. 1). The low residual 
DYANA target function value of 1.73 ± 0.30 A 2 (Table 1) and 
the average global root-mean-square deviation (RMSD) value 
relative to the mean coordinates of 0.89 ± 0.19 A calculated 
for the backbone atoms of residues 6 to 83 in the bundle of Fig. 
la (Table 1) represent a high-quality NMR structure determi¬ 
nation. 

NMR structure of nsP7. The most striking feature of nsP7 is 
that three helices, a2 (29 to 42), a3 (47 to 65), and a4 (71 to 
81), form a flat up-down-up three-a-helix sheet linked by two 
short, well-defined loops with residues 43 to 46 and 66 to 70. 
The stabilization of this unusual structural motif by side chain- 
side chain interactions is discussed below. Overall, the solution 
structure of nsP7 (Fig. 1) consists of a single domain that 
contains a total of five helical secondary structures. Helix al 
spans the residues 11 to 17 and is connected via a two-amino- 
acid linker with a 3 10 -helix of residues 20 to 22. The 3 10 -helix 
leads, via a somewhat disordered loop of residues 23 to 28, to 
a2 (Fig. la). The helix ctl is packed at a 60° angle against the 
surface of a2 and a3 in the flat three-helix sheet, and the 
3 10 -helix runs parallel to a3. 

The structure comparison programs DALI (12) and FATCAT 
(43) at first indicated apparent statistically significant struc¬ 
tural similarity to several previously described folds. However, 
the apparent similarities are all to helix bundles, where three 
or four helices contribute nearly equally to form a tight core. In 
nsP7, the arrangement of the helices a2 to a4 into a flat sheet 
is unique in that no such arrangement of three sequentially 
adjoining a-helices could be found in the SCOP (21) or CATH 
(26) databases, indicating that nsP7 actually represents a novel 
fold. The nsP7 fold may alternatively be viewed as comprising 
an antiparallel three-helix bundle (helices al to a3) with an 
additional helix (a4) added at the C terminus of the three-helix 
bundle in an antiparallel orientation relative to a3. 

The term “helical sheet” has previously been used to de¬ 
scribe helix packing in large proteins, such as annexins, but in 
these structures the helices nonetheless tend to form tightly 
packed bundles. Considering the apparent novelty of the nsP7 
fold, we further investigated the role of the side chain packing 
in stabilizing this molecular architecture, which revealed dis¬ 
tinctly different interhelical side chain-side chain interactions 
in the individual pairs of helices. The tightest association is 
seen between a2 and a3. These antiparallel helices are closely 
packed together, and their axes are oriented at an angle of 10°. 
They associate by virtue of the interdigitation of side chains 
that are separated by three or four residues in the sequence 
and are therefore positioned in two ridges along each helix. 
These residues are predominantly hydrophobic and form two 
interdigitated layers that hold the helices together (Fig. 2). The 
residues forming the layers are, on the one hand, Leu 30, Cys 
34, and He 41 in the helix a2 and Ala 50, Leu 57, and Leu 61 
in a3 and, on the other hand, Trp 31, His 38, and Leu 42 in a2 
and Thr 47, Met 54, Leu 58, and Gin 65 in a3. These interac¬ 
tions are shown explicitly in Fig. 2a and b and schematically in 
the helical wheel plot of Fig. 2c, where the two layers are 
indicated by the two green boxes. This type of interaction is 
reminiscent of that observed in coiled coils, where the super- 
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FIG. 1. Wall-eye stereo views of the solution structure of nsP7. (a) Bundle of the best 20 DYANA conformers of nsP7 after energy 
minimization. Only the polypeptide backbone is displayed. The 20 conformers have been superimposed for minimal RMSD of the backbone heavy 
atoms of residues 6 to 83. The four major helices, ctl to a4, and the 3 10 -helix are colored red and labeled at their N termini, (b) Ribbon presentation 
of the closest conformer of nsP7 to the mean coordinates of the bundle shown in panel a, shown at the same viewing angle as for panel a. The 
sequence positions at both ends of the four a-helices are identified, (c) Same as panel b after rotation about a vertical axis, such that one looks 
at the edge of the up-down-up antiparallel three-a-helix sheet. The sequence positions at both ends of the helices al, 3 10 , a2, and a3 are identified. 



coiling of helices results from the coiling of the ridges around 
the helix axis (25, 27). However, in nsP7 the helices are rela¬ 
tively short, a characteristic which, combined with slight dis¬ 
tortions of the helices at each end, allows the interhelix angle 
to remain small and the helices to remain antiparallel rather 
than coiling around each other. 

Helix al (11 to 17) and the 3 10 -helical turn following it (20 
to 22) associate closely with a2 and a3, and these three helices 
are arranged similarly to other known three-helix bundles. 
Interhelical side chain-side chain hydrophobic interactions ap¬ 


pear to be responsible for this association. Thus, Leu 15 and 
Leu 19 of al associate with Val 35, His 38, Met 54, Val 55, and 
Leu 58 of a2 and a3, while Leu 16 of al and Val 24 of the first 
loop contact the indole ring of Trp 31 in a2 (Fig. 2a and c). Val 
14 and Val 18 of a 1 also contact the surface of helix a3 but are 
partly solvent exposed. The observation that several of the 
aforementioned residues involved in interactions between al, 
a2, and a3 are highly conserved in coronavirus nsP7 sequences 
(Fig. 3) is consistent with the conclusion that they have a key 
role in stabilizing the fold. 
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FIG. 2. (a) Top view of nsP7, generated from Fig. lb by a 90° rotation about a horizontal axis in the projection plane. The helices are shown 
in a ribbon representation and are labeled at their N termini. Side chains involved in interhelix interactions (see the text) are shown as stick 
representations using the following color code: green, Ala, Leu, Val, lie, Tyr, Trp, and Phe; blue, Ser, Thr, Cys, His, Asn, and Gin; red, Arg, Lys, 
Asp, and Glu. Some of the side chains discussed in the text are identified with the one-letter amino acid code and the residue number, (b) View 
of nsP7, generated from Fig. lb by a 180° rotation about a vertical axis; the presentation is the same as for panel a. Some of the side chains discussed 
in the text are labeled, and a hydrogen bond between Glu 52 and Gin 85 is shown as a thin black line. Panels a and b show wall-eye stereo views, 
(c) Schematic top view of nsP7 (same as for panel a) with the helices represented as helical wheels. The helices a2 and a4 are directed from N 
to C into the page, and a3 runs out of the plane toward the viewer, al and the 3 10 -helical turn following it are represented as one helical wheel 
running out of the page; this presentation does not show the tilt of al relative to the other three helices (Fig. 1). The side chains are represented 
as circles, with the hydrophobic side chains shaded. The side chains involved in a2-a3 interactions are shown on a green background, a3-a4 
interactions are on a yellow background, and al-a2, a3 interactions are on a blue background. Glu 52 is shown on a red background to indicate 
the hydrogen bonding interaction with the C-terminal residue, Gin 85. The helical wheel plot was prepared with the Web-based tool at 
http://kael.net. 
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TABLE 1. Input for the structure calculation and characterization 
of the energy-minimized bundle of 20 DYANA conformers of nsP7 


Parameter Value 0 


NOE upper distance limits. 1,066 

Dihedral angle constraints. 96 

Residual target function (A 2 ). 1.73 ± 0.30 

Residual NOE violations. 

No. >0.1 A. 25 ± 4 

Maximum (A). 0.17 ± 0.10 

Residual dihedral angle violations. 

No. >2.5°. 1±1 

Maximum (°). 3.21 ± 1.02 

Amber energies (kcal/mol). 

Total.-2,995.65 ± 157.74 

van der Waals. —172.00 ± 31.55 

Electrostatic.—3,566.95 ± 103.42 

RMSD from ideal geometry*. 

Bond lengths (A). 0.0086 ± 0.0022 

Bond angles (°). 2.257 ± 0.171 

RMSD to the mean coordinates (A) 4 . 

bb (6-83). 0.89 ± 0.19 

ha (6-83). 1.35 ±0.17 

Ramachandran plot statistics (%) c . 

Most favored regions. 65 

Additional allowed regions. 30 

Generously allowed regions. 3 

Disallowed regions. 2 


a Except for the top two entries, the average values for the 20 energy-mini¬ 
mized conformers with the lowest residual DYANA target function values and 
the standard deviation among them are listed, with the ranges indicating mini¬ 
mum and maximum values. 

b bb indicates the backbone atoms N, Ca, C'; ha stands for “all heavy atoms.” 
The numbers in parentheses indicate the residues for which the RMSD was 
calculated. 

c Determined by PROCHECK (16, 20). 


Helix a4 appears to be only weakly associated with a3. 
These two helices are connected by a five-residue loop, in 
which Ala 67 and He 70 associate with Leu 62 of a3. The 
N-terminal end of a4 is therefore in close proximity to a3, 
while the C-terminal end of a4 is displaced away from the N 
terminus of a3. The C-terminal residue, Gin 85, is inserted 
between a3 and a4 and forms a hydrogen bond to the Glu 52 
side chain of a3 (Fig. 2b). Helix a4 is predominantly polar, 
with Leu 78 being the only hydrophobic residue pointing to¬ 
ward a3 (Fig. 2b). Polar and electrostatic interactions involving 
Glu 52, Lys 53, Ser 59, Cys 74, Asp 79, Arg 81, and the 
C-terminal carboxylate group may also contribute to stabilizing 
the a3-a4 packing arrangement. The lack of sequence conser¬ 
vation in helix a4 (Fig. 3) would be consistent with the assump¬ 
tion that the a3-a4 association can be supported by a variety of 
different interactions. 

The positioning of the C-terminal tripeptide segment of 
nsP7 between the helices a3 and a4 is intriguing, since it might 
actually serve to stabilize this part of the fold. The residues Leu 
84 and Gin 85 form part of a conserved proteolytic cleavage 
site and are stringently required for cleavage of the polyprotein 
by the 3C-like protease (3CLpro). The crystal structure of 
3CLpro with a bound inhibitor (40) shows that the side chain 
of the glutamine residue, which is absolutely required for pro¬ 
teolysis, inserts into a conserved pocket in the protease active 
site where it forms hydrogen bonds to histidine and glutamate 
residues. The Leu residue also interacts with the protease, 
although it is partially solvent exposed, and SARS-CoV 
3CLpro has relaxed specificity at this position in that it also 
tolerates other hydrophobic residues (38). In the solution 
structure of nsP7, the glutamine residue would be nearly inac- 
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SARS-CoV 
BCoV 
MHV A59 
HCoV OC43 
PEDV 

HCoV 229E 
TGEV 

HCoV NL63 
IBV 


52 a3 62 72 a4 82 

ITEAFEKMVSLLSVLLSMQdAVD-tjNRLCEEMLDNRAlrLQ- 

LGVAFEKLAQLLIVLFANPA AVDSKCLTSIEEVCDDYAKDNT i/LQ- 

LSVAFDKLAQLLWLFANPA AVDSKCLASIEEVSDDYVRDNT i/LQALQ 

LSVAFEKLAQLLIVLFANPI AVDSKCLTSIEEVCDDYAKDNT VLQ- 

PEKAQEMLLALLAFFLSKNSAFG- 1 DDLLESYFNDNS 4LQ - 

PETAQELLLALLAFFLSKHSDFG-IGDLVDSYFENDSILQ- 

PEIVLEKLLALIAFFLSKHbTCD-ISELIESYFENTT ILQ - 

PEKAQSMLLALLAFFLSKHS D-FGI DGLIDSYFDNSS TLQ - 

/GECMDNLLGMLITLFCIDsI riD- dsEYCDDILKRSTV LO- 


FIG. 3. Multiple alignment of coronavirus nsP7 sequences. Strictly conserved positions are indicated in boldface type. The positions of the 
helices al to u4 and 3 10 are indicated by boxes. Abbreviations and GenBank accession codes for the nsP7 sequences used are as follows: 
SARS-CoV, SARS coronavirus, strain Tor2, NP_828865; PEDV, porcine epidemic diarrhea virus, strain CV777, NP_839961; HCoV 229E, human 
coronavirus 229E, NP_835348; TGEV, transmissible gastroenteritis virus, strain Purdue, NP_840005; BCoV, bovine coronavirus ENT, NP_742134; 
MHV A59, murine hepatitis virus, strain A59, NP_740612; HCoV OC43, human coronavirus OC43, strain ATCC VR-759, NP_937947; HCoV 
NL63, human coronavirus NL63 strain Amsterdam I, YP_003766; IBV, avian infectious bronchitis virus, strain Beaudette, NP_740625. The residue 
numbering of nsP7 is according to the construct used in this study (see text), with the nsP7 sequence in positions 3 to 85. The sequence positions 
3, 12, 22, 32, etc. are labeled (corresponding to residues 1, 10, 20, 30, etc. of the nsP7 sequence). 
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FIG. 4. Wall-eye stereo views of all-heavy-atom presentations of the same conformer of nsP7 as that shown in Fig. lb. (a) Viewing angle as in 
Fig. lb. (b) Viewing angle as in Fig. lc. Color code: green, hydrophobic side chains; blue, all other side chains; red, polypeptide backbone. Some 
of the side chains contributing to surface features discussed in the text are identified with the one-letter amino acid code and the residue number. 
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(b) 

D69 


E76 


E52 

E49 
1)46 


E25 


D69 
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FIG. 5. Surface views of nsP7 in a space-filling presentation, (a) Same orientation as in Fig. lb. (b) View after a 180° rotation about a vertical 
axis, showing the surface formed by the flat three-helix sheet. Color code: gray, hydrophobic and polar residues; red, negatively charged; blue, 
positively charged. Some of the surface side chains discussed in the text are identified with the one-letter amino acid code and the residue number. 
The residues 26 to 28, which are discussed in the text as a putative functional site, are identified in green. 
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cessible to interactions with other proteins, and therefore the 
observed conformation is likely to be assumed only after re¬ 
lease from the polyprotein. 

In an inspection of the presentation of nsP7 shown in Fig. 4, 
the aforementioned helix-helix interactions can quite readily 
be observed. Furthermore, a space-filling surface presentation 
reveals that the residues Lys 9, Arg 23, and Arg 81 lend positive 
charge to one face of the protein, including the groove between 
helix al and the helical sheet (Fig. 5a). On the opposite side, 
the flat surface formed by the three-helix sheet of a2 to a4 is 
divided nearly evenly into negatively charged and hydrophobic 
patches (Fig. 5b). The negatively charged surface areas contain 
the side chains of the residues Asp 46, Glu 49, Asp 69, Glu 75, 
Glu 76, and Asp 79. A large hydrophobic patch is formed by 
the partly or completely surface-exposed side chains of the 
residues Leu 30, Leu 37, lie 41, Leu 57, Val 60, Leu 61, and 
Met 64 in the helices a2 and a3, some of which also participate 
in the interhelix interactions described above. Both areas 
would seem to be potential sites for protein-protein interac¬ 
tions. 

Studies of other coronaviruses have demonstrated the in¬ 
volvement of nsP7 in viral replicase complexes and in specific 
interactions with other nonstructural proteins (2, 4, 44). Given 
the lack of functional information regarding nsP7, we em¬ 
ployed bioinformatics techniques to search for possible func¬ 
tional sites. The serine residues 26 to 28 in the exposed loop 
connecting the helices al and a2 were thus identified as a 
likely functional site by the ConSurf algorithm (6), based on 
strong sequence conservation and surface exposure. These 
three residues were also identified as part of known active sites 
by a search with the PINTS server (36); however, other surface 
features were not similar enough to infer a unique function. 
This apparent failure to relate nsP7 with functional properties 
of related proteins leaves us at present with the possibility that 
the unique sequence and structure of nsP7 are the basis for an 
as-yet-unrecognized, novel functional role unique to the Coro- 
naviridae. 
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